Rapid evolution of promoters from germline-specifically expressed genes including transposon silencing factors

Background The piRNA pathway in animal gonads functions as an ‘RNA-based immune system’, serving to silence transposable elements and prevent inheritance of novel invaders. In Drosophila, this pathway relies on three gonad-specific Argonaute proteins (Argonaute-3, Aubergine and Piwi) that associate with 23–28 nucleotide piRNAs, directing the silencing of transposon-derived transcripts. Transposons constitute a primary driver of genome evolution, yet the evolution of piRNA pathway factors has not received in-depth exploration. Specifically, channel nuclear pore proteins, which impact piRNA processing, exhibit regions of rapid evolution in their promoters. Consequently, the question arises whether such a mode of evolution is a general feature of transposon silencing pathways. Results By employing genomic analysis of coding and promoter regions within genes that function in transposon silencing in Drosophila, we demonstrate that the promoters of germ cell-specific piRNA factors are undergoing rapid evolution. Our findings indicate that rapid promoter evolution is a common trait among piRNA factors engaged in germline silencing across insect species, potentially contributing to gene expression divergence in closely related taxa. Furthermore, we observe that the promoters of genes exclusively expressed in germ cells generally exhibit rapid evolution, with some divergence in gene expression. Conclusion Our results suggest that increased germline promoter evolution, in partnership with other factors, could contribute to transposon silencing and evolution of species through differential expression of genes driven by invading transposons. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-024-10584-9.

Aubergine (Aub) and Argonaute-3 (Ago3).The piRNA pathway is often referred to as an 'RNA-based immune system' thanks to its ability to adopt to new TE insertions and the memory of past transposon activity through piRNA clusters through RNA transgenerational inheritance [1,2,9,10].
Channel nuclear pore proteins (Nups) have been implicated in piRNA regulated transposon silencing in C. elegans [17].Interestingly, in Drosophila, Nups were shown to affect transposon silencing in both the germline [16,18], and in somatic cells of the ovary, here specifically by contributing to the conversion of precursor RNAs from the flam locus into piRNAs [19].Recently, we found that inner and outer ring Nups in the nuclear pore complex (NPC) evolve rapidly through indel accumulation in their promoters [20,21].Such promoter indel variability in Nup54 has dominant, pleiotropic effects on sexual differentiation including neuronal wiring important for female post-mating behaviours directed by male-derived sex peptide, that as a result of sexual conflict could drive speciation by such mechanism [17,20,22].Intriguingly, nuclear import/export pathways have been linked to speciation through hybrid incompatibility, but are in addition important for many cellular processes beyond piRNA processing [23,24].Hence, promoter evolution particularly of piRNA processing genes could have wide impact on transposon silencing and evolution of species through differential expression of genes, but whether regulators of germline transposon silencing generally evolve rapidly through their promoters has not been determined.
Here, we analyse the promoters of genes involved in the ovarian piRNA pathway.We compare piRNA factors required in somatic cells (Yb) or in both the somatic and germline compartments (cTGS and piRNA biogenesis, Fig. 1A) with those that are germ cell-specific (dual-strand clusters and the ping-pong cycle, Fig. 1B).We find that the promoters of germline-specific piRNA factors in insects are hot spots for rapid evolution, while piRNA factors expressed in both somatic and germ cells evolve slower.Analysis of genes with significant accumulation of promoter changes reveals a mix of indel and base change accumulation.Further, our analysis reveals that rapid promoter evolution is a general feature of genes specifically expressed in germ cells, while somaexpressed genes evolve at a similar rate to the average of all Drosophila genes.Through cross-species differential ovary RNA-seq analysis we reveal that germ cell-specific genes minimally diverge in gene expression levels compared to genes expressed in somatic cells.Our findings highlight that promoters of germline genes involved in transposon silencing evolve rapidly and are accompanied by diverging gene expression, suggesting a possible mode of rapid speciation through accumulation of changes in promoters in correlation with additional regulatory factor divergence.
We defined the promoter containing region as a 350-nucleotide (-30 to -380) window upstream of the predicted TATA box region 30 nucleotides upstream of the transcription start site (TSS) [26].In a first approach, we performed a genome-wide promoter evolution analysis of germline-specific as well as somatic and germ cell piRNA factors and compared it to all genes in the Drosophila genome (Fig. 1C and D).We used publicly available PhyloP27way data as the conservation score.We observed increased conservation in promoter regions of piRNA factors expressed in somatic and germ cells, and lower conservation of germ cell-specific piRNA factors compared to the genome (Fig. 1C).Quantifying accumulation of changes revealed that germ cell-specific piRNA factors had accumulated significantly more sequence changes in promoter regions compared to the genome (Fig. 1D).Of note, the somatic and germ cell piRNA factor group evolved slower compared to the genome (Fig. 1D).
We next compared indel and base changes in the promoter regions between five closely related Drosophila species (D. melanogaster, D. simulans, D. sechellia, D. yakuba and D. erecta).Promoter scores (d) were calculated and compared to a control group comprised of the m 6 A methylation machinery genes (Mettl3, Mettl14, fl(2)d, virilizer, flacc, nito and Hakai, Ythdc1 and Ythdf), chosen due to their high evolutionary conservation and requirement for strict stoichiometry for functionality, making this an ideal control group to monitor promoter evolution [21,27,28].In this analysis, the group of somatic and germ cell piRNA factors showed similar evolution rates compared to the control (Fig. 1E-J), while germ cell-specific piRNA factors evolved at a higher rate (Fig. 1E and F), accruing a significantly increased number of promoter-located indels (Fig. 1G and H) and base changes (Fig. 1I and J).At the individual gene level, 7 out of 14 germ cell-specific piRNA factors showed significant accumulation of promoter changes (Supplementary Fig. 1A).Ago3 was omitted due to its low conservation between closely related Drosophila species [29][30][31].Among the somatic and germ cell piRNA factors, 3 out of 15 genes had significant accumulation of promoter changes (Supplementary Fig. 1A).
To assess general evolution in the coding regions, we used McDonald-Kreitman tests (MKTs) and analysed polymorphisms and divergence within D. melanogaster and between D. melanogaster and D. simulans for two ancestral populations (Congo and Zambia) [32].This analysis indicates that germ cell-specific piRNA factors are under positive selection (Supplementary Fig. 1B).Conversely, piRNA factors expressed in both somatic and germ cells were not under positive selection (Supplementary Fig. 1B).

Promoters of genes required for germline transposon silencing are fast evolving
Given the rapid evolution of germ cell-specific piRNA factors among closely related Drosophila species, we wanted to determine (1) whether this is a general feature of insect evolution, and (2) whether this is a general feature of genes required for transposon silencing with pleiotropic roles that have been identified in a genetic screen (GTS100 genes), since this provides a larger gene group [16].To get a scale of the persistence of evolutionary changes in promoters of GTS100 genes we used publicly available PhyloP27way data from UCSC genome browser [33,34].Genes were ordered based on transposon derepression scores [16] and d P was calculated for individual genes, the genome average as a control group, and for the average of GTS100 gene promoter regions (Fig. 2A-C,  3A).Genes without full PhyloP coverage were removed from the analysis (where ≥ 1 nucleotide of the analysed genomic region contained the D. melanogaster sequence only).
The promoter change score average of GTS100 genes showed a significant increase in promoter nucleotide changes compared to the control (Fig. 2C).Although many of these genes have additional roles, gene ontology (GO) analysis confirmed a primary role in piRNA binding (Fig. 2D) [35][36][37].
To understand whether GTS100 genes were individually fast evolving, individual d P scores were calculated (Fig. 3A).Compared to the genome average, 49 genes (FEPG, Fast Evolving Promoter Germline genes) were significantly increased in promoter change events, while 12 were significantly decreased (Fig. 3A).Of note Brother of Yb (BoYb) which is considered the germline replacement of Yb [3], also displayed a fast evolving promoter in contrast to its somatic counterpart Yb (Fig. 3A).To specifically look at indels and base changes at a high confidence level, we analysed FEPG genes for rapid promoter evolution between the previously used five Drosophila species and compared them to the somatic and germ cell piRNA factor control group (Fig. 3B-D).Here, indels and base changes were measured, showing a significant increase in both indels and base changes compared to the control (Fig. 3C and D).

A subset of piRNA factors that are part of distinct protein complexes display hot spots for rapid evolution in promoters
In a saturating genetic screen for regulators of transposon silencing in germ cells members of the Nup complex were identified as regulators of germline piRNA silencing, and later shown to rapidly evolve through their promoters [16,20,21].In the same screen, numerous genes whose products are part of distinct protein complexes were identified, including the nuclear cap-binding complex (CBC), THO complex, the exon junction complex (EJC) inner core, outer shell and transient factors, the non-specific lethal (NSL) complex, SUMOylation modifiers and the complex of proteins associated with Set1 (COMPASS) (Supplementary Fig. 2).Given the fast evolution of promoters in germline-specific piRNA pathway genes and the Nup complex, we analysed promoters of the additional protein complexes affecting germline piRNA silencing to see whether their rate of evolution matched that of other piRNA silencing factors.Compared to somatic and germ cell piRNA factors, promoter regions for the nuclear CBC, SUMO modifiers, COMPASS complex, NSL complex and EJC outer shell showed a significant accumulation of sequence changes (Fig. 3E and F).Analysis of individual genes from both significant and non-significant gene groups revealed that 43% of analysed genes have rapidly accumulated sequence changes in their promoters (Supplementary Fig. 2).Specifically looking at genes in fast evolving complexes (Fig. 3E and   F) revealed that 56% of genes in these groups rapidly evolve through accumulation of mutations in their promoters (Supplementary Fig. 2).
Next, we analysed the rate of evolution in the coding regions of these complexes using MKT tests [32].This analysis flagged the EJC outer shell, EJC transient factors, NSL complex, and SUMO modifiers as under positive selection in both populations, while the Nuclear CBC was under positive selection in the Zambia population only (Supplementary Fig. 4).

Analysis of piRNA factor differential gene expression in ovaries of D. melanogaster and D. yakuba
Since promoters of FEPG genes are fast evolving, we analysed publicly available RNA-seq data from ovaries between D. melanogaster and D. yakuba to assess gene expression divergence in the FEPG gene group compared to the somatic and germ cell piRNA factors (Fig. 4A) [38][39][40].Here, many FEPG genes had a tendency for slight significant divergence in gene expression in both directions (D. melanogaster or D. yakuba), but this was true for only few somatic and germ cell piRNA factors genes including arx (Fig. 4A).Notably, we observed large expression divergence for CG32152, MEP-1 and Rbbp5 (Fig. 4A).
Aligning the promoters of these three genes revealed an increased number of indels and base changes compared to the 5' UTR or the coding region (Fig. 4B-D), but the cohort was unfortunately too small to detect whether increased changes in promoters correlate with altered expression between the two species.Further limitations stem from the pleiotropic roles of the analysed genes, and therefore expression profiles likely differ across cell types in the analysed tissue.
To further analyse the promoters of these genes we performed motif enrichment analysis indicative of transcription factor binding sites in promoters of the somatic and germ cell piRNA factors (Fig. 1A), germ cell-specific piRNA factors (Fig. 1B), and the FEPG piRNA factor gene groups (Supplementary Fig. 5).To contextualise functionality of the enriched motifs we analysed publicly available female adult D. melanogaster ovary expression data.Of these, transcription/DNA binding factors Mothers against dpp (Mad), Boundary Element-Associated Factor of 32kD (BEAF-32) and DNA replication-related element factor (Dref ) were expressed in ovaries and their binding motifs were enriched in the FEPG piRNA factor gene group (Supplementary Fig. 5) [41][42][43][44].

Germ cell-specific genes show rapid evolution of promoters associated with divergent gene expression between species
To investigate whether the rapid evolution of promoters in piRNA factors expressed in germ cells is a common feature of germline-expressed genes, we used RNA-seq data from FACS-sorted vasa-GFP ovaries [45] to identify genes specifically expressed in either germ cells or somatic cells of the ovary.First, using differential RNA-seq analysis of FACS-sorted vasa-GFP-positive (germline) and vasa-GFP-negative (somatic) cells from ovaries of a transgenic vas-GFP D. melanogaster strain, we defined germline-enriched (vasa-GFP + ; log 2 fold change > 2, adjusted p value < 0.01) and soma-enriched (vasa-GFP-; log 2 fold change < -1, adjusted p value < 0.01) genes (Fig. 5A and B).
Next, using RNA-seq data from ovarian somatic cells (OSCs) (GSE160860) [46], we further filtered the somaenriched genes that are highly expressed in OSCs.Additionally, we further filtered the germline-enriched genes that show expression in early fly embryos (0-2h, modENCODE).In this way, we generated a stringent list of 107 genes specifically expressed in germ cells and of 59 soma-expressed genes (Supplementary Dataset 1).Of note, the soma-enriched gene group contained no piRNA factors expressed in both somatic and germ cells, while the germline-enriched gene group contained 7 from 107 genes identified as germ cell-specific piRNA factors (Supplementary Dataset 1).
Fig. 5 Germ cell-specifically expressed, but not somatic cell-specifically expressed gene promoters evolve rapidly and are accompanied by divergence in gene expression.A and B Ranking of genes (A) and volcano plot showing differential gene expression (B) using vasa-GFP signal from FACS-sorted vasa-GFP+/-cells from ovaries of transgenic vas-GFP D. melanogaster [45].Groups are defined as germline-enriched (vasa-GFP+) and soma-enriched (vasa-GFP-) genes.C and D PhyloP27way nucleotide scores for somatic cell-specific (C) and germ cell-specific (D) expressed genes ordered by promoter change d P scores.A region of 1000 nucleotides upstream and 300 nucleotides downstream are shown.Red represents lower conservation while blue represents higher.E Average PhyloP27way nucleotide scores for all genes in the Drosophila genome, somatic cell and germ cell-specific genes.Purple represents lower conservation while blue represents higher.F The percentage of somatic cell and germ cell-specific genes with significant promoter nucleotide changes (d P ) versus all genes in the Drosophila genome average, divided by significantly increased (low conservation) or decreased (high conservation).Statistically significant differences from non-parametric chi-squared tests are indicated by asterisks (**** p≤0.0001 following Bonferroni correction).G and H PhyloP27way conservation score averages (G) and PhyloP27way promoter change scores (d P ) (H) for the 350-nucleotide promoter regions compared for all genes in the Drosophila genome, somatic cell-specifically expressed and germ cell-specifically expressed genes.Statistically significant differences from unpaired student t-tests (G) and from non-parametric chi-squared tests (H) are indicated by asterisks (** p≤0.01,**** p≤0.0001 following Bonferroni correction) We then analysed sequence change accumulation in promoters using PhyloP data and calculated d P for all D. melanogaster genes and genes expressed specifically in somatic cells or the germline (Fig. 5C and D) or the average (Fig. 5E).Germ cell-specifically expressed genes significantly accumulated changes (d P ) in their promoters compared to soma-expressed genes (Fig. 5F).Comparison of the average promoter PhyloP scores and d P conservation scores between the genome average and somatic or germ cell-specifically expressed genes equally revealed a significant increase in germ cell gene promoter nucleotide changes compared to the genome average (Fig. 5G and H).
Since rapid promoter evolution of germline piRNA silencing factors correlated with gene expression divergence in our previous analysis, we hypothesised that this was a general feature of germ cell genes.To test this, we compared gene expression divergence between somatic cell-specific and germ cell-specific genes using publicly available RNA-seq data of D. melanogaster and D. yakuba ovaries (Fig. 6A) [38][39][40].Notably, genes specifically expressed in germ cells showed slightly higher divergence in expression compared to soma-expressed genes (Fig. 6A).When d P conservation scores were plotted against the gene expression change between D. melanogaster and D. yakuba, the germ cell-specifically expressed genes showed significant divergence in expression for all d P score groups compared to soma-expressed genes, though the distribution of log 2 fold changes in expression varied significantly for only the > 25-50 and > 50-75 groups (Fig. 6B-D, Supplementary Fig. 6).

Discussion
Through genomic analysis of germline and somatic transposon silencing genes and their regulators, we identify hot spots for rapid evolution in promoter regions (-30 to -380 nucleotides upstream of the TSS) of germ cell-specific (dual-strand clusters and the ping-pong cycle), but not soma-expressed piRNA pathway genes (cTGS and piRNA biogenesis).Further, we show that rapid promoter evolution is a general feature of germ cell-specific genes compared to those expressed only in the soma.Our analysis suggests that this mode of evolution in the germline pathway could be a general feature of at least insect evolution.Overall, our results point towards a key role for rapid evolution of gene promoters in the germ cellspecific piRNA pathway which could, coupled with other drivers of expression divergence, impact the expression of germline regulatory genes.
Transposon mobility has been attributed to the accumulation of sequence changes in promoter regions because of the presence of open chromatin around the TSS of genes [47].Interestingly, in core NPC genes, rapid evolution mostly led to accumulation of indels [20,21], while piRNA pathway genes display equal accumulation of indels and base changes.Likely, indels have more profound effects on changes in transcription than single nucleotide alterations.This feature might reflect that compromised transposon silencing causes sterility, hence not allowing for substantial changes in expression in piRNA processing genes.Likewise, stoichiometric changes in protein complexes can drastically alter complex functionality, e.g. the male-specific lethal (MSL) complex binds less to its targets when one component of the complex is missing [48,49].Hence, changes in the protein levels of individual piRNA factors likely have dominant effects on their capacity to silence transposon activity.
The occurrence of novel transposons can fundamentally impact species reproductive success through a phenomenon call hybrid dysgenesis [50].Here, if a novel transposable element is transmitted by the male, female fertility is severely compromised and can result in sterility as a result of missing piRNA silencing of the novel invader.Essential to combat these novel active transposable elements is the ping-pong cycle, primed by transcripts from the novel transposon.Accordingly, females can be become resistant to such novel transposons over time through RNA transgenerational inheritance of piR-NAs to build up ping-pong cycle amplification in germ cells [9,10,51].
Our examination of evolution in coding regions flagged germ cell-specific piRNA factors, as well as some pleiotropic piRNA processing associated complexes (EJC outer shell, EJC transient factors, NSL complex, SUMO modifiers, and Nuclear CBC) as under positive selection.This is unsurprising in the case of primary piRNA processing factors in the germline, as there are many examples of these factors being under selection [52][53][54][55][56][57][58][59][60].For instance, replacement of D. melanogaster rhino and deadlock genes with D. simulans homologues results in non-functionality [53].Here, D. simulans Rhino binding domains no longer bind to D. melanogaster Deadlock, resulting in failed localisation to piRNA clusters [53].Such effects resulting from few amino acid changes act as powerful driving forces in diverging piRNA processing.In this setting, rapid promoter evolution of piRNA silencing genes could act as an additional factor driving changes in expression levels and factor stoichiometry for piRNA processing divergence, which may be difficult to explore when coupled with functional divergence.Intuitively, one would associate rapid promoter evolution with changes in expression, however, we observed minimal effects on expression.Since transposon silencing is essential, changes in cis regulatory elements of these regulatory factors are likely constrained by compensatory mechanisms and could be accompanied by changes in enhancer regions.Whether this is the case would require a detailed knowledge of relevant enhancers and transcription factor binding.
The pattern of changes observed in fast evolving promoters like germline specific piRNA factors and Nups resembles the outcome of a P-element mutagenesis experiment in the egghead (egh) locus coding for a glycosphingolipid biosynthesis enzyme [61].Here, multiple base changes were observed in the region of the first promoter after P-element mutagenesis, but not an actual P-element insert, resulting in a sex peptide insensitive allele egh cm .Mutations in the egh gene result in pleiotropic phenotypes and the egh cm allele disrupts neuronal wiring required for the female post-mating response and development of the optic lobe [22,61,62].
Since establishing transposon resistance involves forced selection, genes whose expression changes result in pleiotropic effects, like channel Nup54, could combine adaptations in piRNA processing with changes in neuronal wiring resulting in altered behavioural preferences [20].Of note, in D. simulans, a close relative of D. melanogaster, projections of fruitless P1 sensory neurons that control courtship have changed and alter mate preference [63].
The severe impact of hybrid dysgenesis on fertility likely limits fast evolution to species with high numbers of progeny.Perhaps this can explain the differences seen when comparing insect and mammalian promoters that are generally more conserved [26].However, rapid promoter evolution has also been observed in primate promoters, suggesting a common mode of evolution that has yet to be explored [20,21,26,64].

Sequence/data retrieval and alignment
Gene and promotor sequences were retrieved from UCSC Genome Browser using the UCSC Table Browser sequence retrieval tool [33,34].A standardised region of 2000 bases upstream of the annotated gene TSS was exported for each gene to ensure inclusion of promoter regions.Sequences were imported and aligned with clustalW in MEGA11 [65].PhyloP27way data were sourced from UCSC genome browser (genome.ucsc.edu) through the Table Browser tool [33,34].Data points were collected for a region of 1000 nucleotides upstream and 300 nucleotides downstream for each of the analysed genes.Genes without full species coverage when analysing evolution between the 5 Drosophila species, where ≥ 1 species had no conserved genomic sequence were removed from the analysis.Genes without full Phy-loP coverage were removed from the analysis (where ≥ 1 nucleotide of the analysed genomic region contained the D. melanogaster sequence only).Genes present in more than one group in direct comparisons were removed from that comparison.

Molecular evolution of open reading frames analysis
Analysis of adaptive protein evolution was performed via MKTs [32] using the PopFly online database (imkt.uab.cat) which uses Drosophila Genome Nexus project sequence data [66][67][68].Polymorphism and divergence data were collected from the ancestral Congo and Zambia populations, chosen due to their high ancestral stability compared to other populations.The 'Standard MKT' test was used to calculate results for individual genes.Fisher's Exact Test with significance defined as FDR corrected p ≤ 0.05 was used to calculate significance between nonsynonymous to synonymous polymorphisms within D. melanogaster or between D. melanogaster and D. simulans.

Identification of promotor hot spots and comparison of substitution rates
All gene models were manually inspected in FlyBase using the JBrowse browser and dominant transcripts were chosen based on comparative modENCODE expression data [69].Gene and promoter sequence alignments were trimmed to -1000/ + 300 nucleotides from the TSS of each gene.To calculate the frequency of sequence change hotspots in promoter sequences, we calculated the hotspot accumulation score d using alignments generated as described, or PhyloP data.Alignments were translated into 'events' for each species compared to D. melanogaster, where for each nucleotide in the sequence, 0 signified a conserved sequence and 1 signified a sequence change (base change or indel event) [70].Events were calculated for all changes, as well as base changes and indels individually.The sum of events was calculated for concatenated gene groups and individual genes at each nucleotide position.A sliding event (Se) score was calculated from this using a sliding window of five bases along the sequence, from which heatmaps were generated.To calculate the percentage of events greater than the average control promoter Se score (d), a 350-nucleotide region upstream of the estimated TATA box region was analysed where the total number of Se scores exceeding the control group average sliding event score (Se C ) was divided by the total number of events in that region (N).
To calculate the promoter region d scores from Phy-loP data, the total number of PhyloP (p) scores in the 350-nucleotide regions less than the control group average promoter region (p C ) took the place of the total number of Se events where Se is greater than Se C (see equation below).
Significance was calculated using non-parametric chisquared tests compared to the control group d score.Significance values where p ≤ 0.05 with FDR or Bonferroni correction were considered statistically significant.

Fig. 1
Fig. 1 Promoter regions of germ cell-expressed piRNA processing factors are hot spots for rapid evolution.A and B Schematic depiction of somatic and germ cell piRNA factors involved in co-transcriptional gene silencing and piRNA biogenesis (A) and germline-specific piRNA pathway genes involved in dual-strand cluster regulation and the ping-pong cycle (B) in the Drosophila ovary.C and D PhyloP27way conservation score averages (C) and PhyloP27way promoter change d P scores (D) for the 350-nucleotide promoter regions of the somatic and germ cell piRNA factors and the germ cell-specific piRNA factors compared to all genes in the Drosophila genome.Statistically significant differences from unpaired student t-tests (C) and non-parametric chi-squared tests (D) are indicated by asterisks (*** p≤0.001, **** p≤0.0001 following Bonferroni correction).E-J Heatmaps indicating sequence (F, purple), indel (H, blue) or base change (J, green) accumulation, and their quantification (E-I) depicted as d scores for each of the analysed gene groups compared to the control among closely related D. melanogaster, D. simulans, D. sechellia, D. yakuba and D. erecta.Regions of 1000 nucleotides upstream and 300 nucleotides downstream of the TSS were analysed.Analysis was performed for the control group (m 6 A writer complex and readers), the somatic and germ cell piRNA factors, and the germ cell-specific piRNA factors.The blue line indicates the promoter region used for quantification of the substitution rate based on the gene and intergenic region schematic.Statistically significant differences from non-parametric chi-squared tests are indicated by asterisks (**** p≤0.0001 following Bonferroni correction)

Fig. 2
Fig. 2 Promoters of genes required for germline transposon silencing evolve fast.A PhyloP27way nucleotide scores for the top 100 genes affecting transposon silencing in germ cells (GTS100) based on transposon derepression z scores and ordered by promoter conservation d P scores.A region of 1000 nucleotides upstream and 300 nucleotides downstream are shown.Red represents lower conservation while blue represents higher.B Average PhyloP27way nucleotide scores for the GTS100 genes affecting transposon silencing in germ cells compared to the genome average.C Comparison of PhyloP27way promoter change scores (d P ) for each of the GTS100 genes affecting transposon silencing in germ cells compared to the genome average.Statistically significant differences from non-parametric chi-squared tests are indicated by asterisks (** p≤0.01 following Bonferroni correction).D Representative GO analysis for the GTS100 gene candidates affecting transposon silencing

Fig. 3
Fig. 3 Promoters of FEPG genes and their associated complexes are hotspots for rapid evolution.A Individual gene comparison of PhyloP27way promoter change scores (d P ) for GTS100 genes affecting transposon silencing in germ cells compared to the genome average.Statistically significant differences from non-parametric chi-squared tests are indicated by asterisks (* p≤0.05, ** p≤0.01, *** p≤0.001, **** p≤0.0001 following FDR correction).B-D Sequence analysis for change accumulation among closely related D. melanogaster, D. simulans, D. sechellia, D. yakuba and D. erecta.Comparisons were performed for sequence change (B), indel (C), and base change (D) d scores for the significantly increased GTS100 genes in the FEPG gene group compared to the somatic and germ cell piRNA factors.Statistically significant differences from non-parametric chi-squared tests are indicated by asterisks (**** p≤0.0001 following Bonferroni correction).E Comparison of d scores for protein complex genes involved in germ cell transposon silencing compared to the somatic and germ cell piRNA factors.Statistically significant differences from non-parametric chi-squared tests are indicated by asterisks (**** p≤0.0001 following Bonferroni correction).F Heatmaps indicating sequence change accumulation for protein complex genes involved in germ cell transposon silencing compared to the piRNA factors shown in purple among closely related D. melanogaster, D. simulans, D. sechellia, D. yakuba and D. erecta.Regions of 1000 nucleotides upstream and 300 downstream of the TSS were analysed.The blue line indicates the promoter region used for quantification of the substitution rate

Fig. 4
Fig. 4 Analysis of differential gene expression of FEPG genes in D. melanogaster and D. yakuba ovaries.A Volcano plot showing differential gene expression (DESeq2) between D. melanogaster and D. yakuba ovaries (RNA-seq data; Supplementary Dataset 1) for FEPG (blue) and the somatic and germ cell piRNA factors (red).Dashed lines indicate significance thresholds (log 10 adjusted p value <-0.5, log 2 fold change >0.5/<-0.5).B-D Alignment of the promoter regions of CG32152, MEP-1 and Rbbp5 for D. melanogaster and D. yakuba with indels (red) and base changes (blue) indicated as boxes.The black lines indicate gene 5' UTRs

Fig. 6
Fig. 6 Analysis of gene expression divergence between germ cell and somatic cell-specifically expressed genes.A Volcano plot of differential gene expression between D. melanogaster and D. yakuba ovaries (RNA-seq data; Supplementary Dataset 1) for all (grey), germ cell-specific (blue) and somatic cell-specific (red) gene groups.Dashed lines indicate significance (log 10 adjusted p value <-0.5, log 2 fold change >0.5/<-0.5).B Scatterplot of individual gene d P scores plotted against fold-change expression differences between D. melanogaster and D. yakuba ovaries (RNA-seq data; Supplementary Dataset 1) for germ cell-specific (blue) and somatic cell-specific (red) gene groups.The grey dotted lines indicate log 2 fold change threshold values of >0.5 and <-0.5.C and D Comparison of the expression distribution and percentage of genes with significant changes in D. melanogaster log 2 fold change (<-0.5, C) or significant changes in D. yakuba log 2 fold change (<0.5, D) expression changes in PhyloP conservation promoter score (d P ) ranges >25-50, >50-75, and >75-100.Expression changes for germ cell-specific and somatic cell-specific gene groups were analysed between D. melanogaster and D. yakuba.Statistically significant differences from non-parametric chi-squared tests are indicated by asterisks (** p≤0.01,*** p≤0.001, **** p≤0.0001 following FDR correction) d = Total number of Se events where Se > Se C N × 100 d P = Total number of p scores < p C N × 100